Feasibility and reliability of online vs in-person cognitive testing in healthy older people

Background Early evidence in using online cognitive assessments show that they could offer a feasible and resource-efficient alternative to in-person clinical assessments in evaluating cognitive performance, yet there is currently little understanding about how these assessments relate to traditional, in-person cognitive tests. Objectives In this preliminary study, we assess the feasibility and reliability of NeurOn, a novel online cognitive assessment tool. NeurOn measures various cognitive domains including processing speed, executive functioning, spatial working memory, episodic memory, attentional control, visuospatial functioning, and spatial orientation. Design Thirty-two participants (mean age: 70.19) completed two testing sessions, unsupervised online and in-person, one-week apart. Participants were randomised in the order of testing appointments. For both sessions, participants completed questionnaires prior to a cognitive assessment. Test-retest reliability and concurrent validity of the online cognitive battery was assessed using intraclass correlation coefficients (ICCs) and correlational analysis, respectively. This was conducted by comparing performance in repeated tasks across testing sessions as well as with traditional, in-person cognitive tests. Results Global cognition in the NeurOn battery moderately validated against MoCA performance, and the battery demonstrated moderate test-retest reliability. Concurrent validity was found only between the online and paper versions of the Trail Making Test -A, as well as global cognitive performance between online and in-person testing sessions. Conclusions The NeurOn cognitive battery provides a promising tool for measuring cognitive performance online both longitudinally and across short retesting intervals within healthy older adults. When considering cost-effectiveness, flexible administration, and improved accessibility for wider populations, online cognitive assessments show promise for future screening of neurodegenerative diseases.


Introduction
Cognitive functioning is vital for everyday behaviour.It is well established that within ageing, many aspects of cognition typically decline [1,2].It is also increasingly evident that subtle cognitive changes appear long before clinical manifestations of neurodegenerative diseases such as dementia become apparent [3].Therefore, it is important to understand whether these cognitive changes are indicative of neurodegenerative disease symptomatology or are typical of healthy ageing.Earlier diagnosis of cognitive impairment is invaluable for patients and caregivers as it can inform management of patient wellbeing and provide targeted measures for lifestyle modifications to potentially reverse cognitive decline [4,5].
Neuropsychological testing is therefore required to measure changes in cognitive functioning [6,7].Nonetheless, routine cognitive assessments in healthy ageing are rarely conducted and typically rely upon quick to administer paper-based tests [8].The current gold-standard tests for assessing cognitive impairment, such as the Mini-Mental State Examination (MMSE), were developed to screen for dementia, but are less sensitive in identifying milder cognitive impairment [9,10].Furthermore, clinic assessments involving paper-based tests are limited as they are prone to practice effects [11] and cognitive changes may be masked by fluctuations in cognitive performance or differences in cognitive reserve [12].
In recent years, significant developments in online cognitive testing have increased its usage in both research and clinical environments [13].Notably, online assessments can be performed remotely to improve accessibility and frequency of online cognitive testing, enabling the identification of more subtle changes in cognitive decline [14,15].Digital assessments also provide enhanced precision in data measurement, standardised presentation, pseudorandomisation to reduce practice effects, and greater cost-efficiency [13,16].Most computerised testing to date has focussed upon processing speed and attention tasks, with many demonstrating promising results [17][18][19].A recent systematic review found early evidence suggesting that computerised cognitive testing shows potential clinical utility in diagnosing neurocognitive disorders.However, there has been limited validation work in cognitive batteries, which is necessary to establish whether they are feasible for clinical applications [20].To date, there has been mixed evidence in the agreement between digitalised and traditional, paper-based neuropsychological tests-with some studies showing considerable agreement [21,22] and others showing little agreement [17,23].Additionally, the test-retest reliability of performance on these digital tests is not yet well-established.
Comprehensive neuropsychological test batteries that assess a variety of cognitive domains are required to detect early cognitive changes that may manifest in older age.Normative data for healthy older adults is necessary to enable for interpreting cognitive performance in context of sociodemographic factors, so that at-risk populations can be accurately identified.Our group recently developed NeurOn, a novel cognitive battery, as part of the DECISION study [24].NeurOn is a comprehensive cognitive battery testing a variety of cognitive domains and is novel in that it also assesses spatial orientation ability-of which previous work from our group has shown to be a key signature for preclinical dementia [25].
In the present study, we aim to evaluate the psychometric properties of the NeurOn battery by measuring the reliability and validity in both supervised in-person and unsupervised online settings against established traditional neuropsychological assessments.It is hypothesised that online cognitive tasks will demonstrate test-retest reliability over a one-week period; online/ remote cognitive tasks will demonstrate concurrent validity with in-person/traditional cognitive task equivalents; and that cognitive performance in the neuropsychological battery will validate against established clinical tests in measuring cognitive performance.

Recruitment
Thirty-three older adults (65+) were recruited from the community via online and offline advertisements to take part in the study.All participants were pre-screened to assess whether they were cognitively and physically healthy; had any history of psychiatric or neurological disease; history of substance abuse disorder; drive once per week or more; and whether they had previously taken part in a study using the online cognitive platform.Recruitment and testing of participants took place between 1 st October 2022 and 30 th March 2023.Written informed consent was obtained from each participant and data was attributed anonymously.Ethical approval for the study was provided by the Faculty of Medicine and Health Sciences Research Ethics Committee at the University of East Anglia (FMH2019/20-134).
To ensure adequate statistical power, a power analysis was conducted for evaluating the test-retest reliability and concurrent validity of the cognitive testing battery.A total sample size of 32 (degrees of freedom = 31) was determined for the test-retest reliability analysis, using a matched paired t-test, with a power of 0.95 and a critical t score of 1.70.The analysis was powered at a 0.95 alpha error probability, assuming a moderate effect size of 0.6.
This sample size was deemed sufficient for also powering the analysis of concurrent validity, assuming a large effect size of 0.50, an alpha error of 0.05, a power of 0.94, and a critical t score of 1.70.

Procedure
Screening was carried out via online video call (32) and telephone (1) by the study team prior to baseline cognitive assessment.One participant was excluded from the study as they only completed one testing session due to illness, and therefore 32 participants were retained for analysis (mean age: 70.19).Participants were randomised to the order in which they completed testing sessions.Prior to the baseline appointment, participants were asked with which device they would most comfortably complete the remote assessment appointment (desktop, laptop, tablet) and the device was matched for the in-person testing appointment.Both testing sessions started with completion of questionnaires pertaining to demographics, subjective cognition, and driving history.Each participant completed the follow-up testing session one week from the baseline testing session at the same time of day.

Development and description of the online cognitive testing platform
Questionnaires and cognitive tasks were hosted on NeurOn-an online platform.The novel cognitive battery was developed by a professional programmer alongside the project team.Online neuropsychological tests were based on a combination of established, traditional neuropsychological tests and established novel tasks (Virtual Supermarket Task) and was developed for unsupervised assessment.Tests were designed to be completed in unmonitored conditions.Tasks were accompanied with written instructions and video tutorials with a voice-over (except for the Go-No/Go test) prior to test completion to promote multimodal learning.After receiving instructions, practice sessions for each task followed to ensure participants were prepared for the actual test.Participants were encouraged to complete the main test battery in one session without breaks but were advised to take a break prior to the Virtual Supermarket Task due it having a significantly longer duration and greater task difficulty.If the cognitive test battery was interrupted (i.e. by participants taking a break/ internet disconnection), participants resumed the task from their current progress upon logging back in.All tasks were pseudorandomised to enable for repeated testing.All participant input was saved on a protected server throughout each test element.

Online cognitive tasks
The NeurOn battery consisted of a variety of digitalised tasks that measure cognition across a variety of domains that are sensitive to age-related cognitive impairment.A Reaction Time task, whereby participants responded as quickly as possible to a repeating on-screen stimulus, measured visuomotor speed (milliseconds).Trail-Making Test -A, involving the connecting of 25 numerically arranged points in ascending order as quickly as possible, measured processing speed (seconds).Trail-Making Test -B, involving the connecting of 25 points of alternating numbers and letters in ascending order as quickly as possible, measured executive functioning (seconds).Episodic memory involved a stimulus encoding phase of everyday objects appearing consecutively in varying screen locations, followed by a delayed testing phase where participants decided whether a stimulus was shown previously (measuring recognition memory-% correct), and, if so, its screen position (measuring source memory-% correct).A Spatial Span-Backwards task measured spatial working memory (maximum number correctly recalled), whereby participants recall and reverse an array of lit-up boxes ranging from 2-9 sequences.The Go/No-Go task measured attentional control (number of errors) by asking participants to respond to a specific stimulus (Go) and inhibit responses to other stimuli (Nogo).The Fragmented Letters task assessed visuospatial functioning (% correct) by asking participants to identify a singular letter from the alphabet which is fragmented through a visual mask.Finally, the Virtual Supermarket Task, previously described in detail [26], measured allocentric and egocentric orientation (both deviation error from correct location) by asking participants to orient a trolley in a virtual supermarket according to a previously presented video clip.Detailed task descriptions are available in S1 Table in S1 File.

Remote cognitive testing
Participants completed the remote cognitive testing session from their own home.Initially, participants completed demographics and novel subjective cognition questionnaires (Spatial Memory & Driving, Orienteering, and Navigation).Participants then completed the online cognitive test battery, consisting of the Reaction Time task, Trail Making Test -A, Trail Making Test -B, Picture Recognition, Spatial Span Backwards, Go/No-Go test, Fragmented Letters, and Virtual Supermarket Test.

In-person cognitive testing
The in-person cognitive testing session took place in a quiet testing facility and involved a combination of traditional neuropsychological tests, requiring face-to-face assessment, with our novel online tasks.Participants initially completed established questionnaires measuring subjective cognition (Cognitive Change Index (CCI) [27] and Santa Barbara Sense of Direction (SBSOD) [28].Participants then completed the Montreal Cognitive Assessment (MoCA) [29], Reaction Time task (Online), paper versions of the Trail-Making Test A & B [30], Rey Osterrieth Complex Figure Test (ROCF)-delayed recall [31], Corsi Block Tapping Test [32], Go-No/Go (Online), a paper version of the Fragmented Letters test [33], and finally the Virtual Supermarket Task (Online).

Statistical analyses
Neuropsychological test measures.To create an episodic memory measure for the online cognitive battery, an average score was found between recognition and source memory percentages for each participant.Outliers were identified through boxplot analysis, and participants were excluded from a test if their average values deviated more than 3 standard deviations from the mean.For the remote session, outliers were removed for Reaction Time (1), Trail Making Test -A (1), Spatial & Memory-Lifetime (1).For the in-person testing session, outliers were removed for the CCI (1), Reaction Time (1), Trail Making Test -A (1), Trail Making Test -B (1), ROCF recall (1), and Go-No/Go (1).Two participants did not complete the Virtual Supermarket Task in either test session due to either a technical error or finding the task too difficult, and therefore were removed from analysis.One participant did not complete the Picture Recognition task due to a technical error.A Bonferroni adjusted significance level of 0.00625 (0.05/8) was used to assess statistical significance in correlations between the CCI and online cognitive assessments.Raw cognitive test scores were standardised for regression analysis, except for episodic memory which was converted into a proportion as this score measured for accuracy in percentage.Appropriate diagnostic tests and visual inspections were carried out to assess regression assumptions, including linearity and homoscedasticity, normality of residuals, independence of residuals, and multicollinearity.All analysis was carried out in R (version 4.4.0).

Concurrent validity (remote vs in-person testing).
Concurrent validity was measured by Spearman or Pearson correlations (depending on variable distribution) to assess the consistency between remote/online and in-person/traditional neuropsychological tests.A correlation threshold of �0.40 was used to establish acceptable concurrent validity [34].The online Trail Making Tests were compared to the paper Trail-Making Tests; the Spatial Span-Backwards task was compared with the Corsi Block Tapping test; the Picture Recognition task was compared with the ROCF-delayed recall task; Fragmented Letters was compared with the paper Fragmented Letters task; and Global Cognition was compared with MoCA score.
Test-retest reliability.To examine test-retest reliability of the repeated online cognitive tasks from baseline to retest sessions, two complimentary approaches were conducted: 1. Two-way mixed effects intraclass correlation coefficients (ICCs) with measures of absolute agreement (95% CI) according to McGraw & Wong [35].
2. Paired samples t tests assessed performance differences.A significant (p < .05)improvement over time was used as a threshold to indicate practice effects.
Global cognition.To establish a global cognition score for each testing session, Z-scores for each neuropsychological measure within each testing session were averaged to create a composite score.Z-scores were reversed to ensure consistent directionality within each task.

Demographics and cognitive battery characteristics
To complete the NeurOn cognitive battery, 41% of participants used desktops, 41% used laptops, and 18% used tablet devices to complete the study.On average, the online testing session took 58 minutes and 50 seconds whilst the in-person testing session took 66 minutes and 50 seconds.No significant differences were found between age, education, MoCA score, CCI score, or time taken to complete online and in-person testing batteries between males and females (see Table 1).

Concurrent validity (remote vs. in-person testing)
To determine how online cognitive tests validated against traditional cognitive tests, concurrent validity was measured for online tasks with traditional cognitive test equivalents.Only Trail Making Test -B met the acceptable correlation threshold value to demonstrate acceptable concurrent validity between tasks, r(28) = 0.615, p < .001.Low correlations were established for Trail Making Test -A (r(29) = 0.255, p = 0.17), Spatial Working Memory (r(30) = 0.268, p = 0.14), and Episodic Memory (ρ = 0.269, p = 0.16, N = 29).A ceiling effect was observed for the Fragmented Letters task in both paper and online versions across both testing sessions (see Table 2).3).

Association with established cognitive assessments
To determine how the online cognitive testing battery is associated with established cognitive assessments, correlation analysis was carried out between individual cognitive tests and total CCI score.Spearman rank correlation analysis found that higher CCI score was positively associated with worse egocentric orientation performance, r(27) = -.453,p = .014,however this was not statistically significant after Bonferroni correction.No other cognitive assessments were found to correlate with the CCI (see Table 4).Correlation analysis was then conducted to establish whether global cognitive performance from the online cognitive battery validated against the MoCA.A Pearson's correlation found that global cognition performance showed a moderate negative correlation with MoCA performance, r(24) = .598,p = .001(Fig 1).

Discussion
With the rising aging population, there is an urgent need to establish screening tools for early identification of cognitive decline during ageing.This preliminary study assessed the feasibility, reliability, and validity of a novel online cognitive testing battery in an older adult population to establish its applicability in acquiring cognitive performance data in a healthy older adult population within unsupervised, remote settings.Importantly, we demonstrate that global performance in the cognitive battery validates against the MoCA-one of the most popular tests for screening for mild cognitive impairment (MCI).We also demonstrate that egocentric orientation performance was the only cognitive domain associated with ratings on the CCI.As predicted, we establish test-retest reliability of the battery as all repeated tests showed moderate test-retest reliability and no practice effects were present after a one-week washout period between testing sessions.Finally, we explore factors that influence performance in online cognitive assessments and find that older age is associated with worse processing speed and allocentric orientation performance.
Due to individual differences in cognitive trajectories during ageing, composite measures assessing a range of cognitive domains have been suggested as the most appropriate approach to screen for and track cognitive impairment over time [36].Within the present study, we demonstrate that global cognitive performance in the online cognitive battery shows a strong correlation with global cognition measured by the MoCA.To date, very few studies have validated online cognitive assessments in older adults [37], and fewer still have shown that online cognitive assessments provide comparable diagnostic accuracy to the MoCA [38].Our results indicate that the NeurOn battery provides a promising instrument for measuring cognitive performance remotely at a similar accuracy with clinical testing appointments.
Many traditional cognitive assessments, such as the MoCA and MMSE, are limited by practice effects which may compromise the ability to interpret whether cognitive change is due to task experience rather than ageing effects [39,40].Practice effects are more likely to occur within shorter testing intervals and are prominent across one week re-testing intervals [41,42].In the present study, despite a short retesting period of one week, no statistically significant improvement was found across any of the repeated tasks.This lack of improvement may be due to the pseudorandomisation of task material within the NeurOn battery, which prevents participants from learning task specific content.Although a one-week retesting period is not typically used for clinical relevance for neuropsychological testing [42], it can be valuable for assessing cognitive changes after short-term intervention studies [43].Reduced practice effects also enable for identification of subtle changes in cognitive trajectories longitudinally, which are rarely conducted in routine clinical appointments due to being resource intensive.Contrary to our hypotheses, we found that only Trail Making Test-A and global cognitive performance demonstrated concurrent validity to traditional paper-based tasks, respectively.Previous research shows that concurrent validity of online cognitive testing is typically low (median 0.49) [17], and therefore correspondence between online cognitive tests and paperbased tasks is typically moderate at best.It is possible that digitalising some traditional paperbased tasks influences test performance, and therefore comparing online cognitive test performance to non-computerised normative data may be less valid in assessing cognitive impairment.Nevertheless, due to the enhanced precision, standardisation, and objectivity in data measurement offered by online cognitive testing, computerised cognitive tasks can be used to develop new normative data thresholds that can assess more sensitively for cognitive changes.Furthermore, online testing opens the possibility of testing a significantly larger and more diverse population demographic who may not have access to clinical assessments.By establishing extensive normative datasets, it is possible to establish how cognitive changes over time differ across specific subpopulations, which enable for more accurate diagnostic markers [44].Given that age-related variability in cognitive performance increases rapidly after age 60 [45], it is essential to account for sociodemographic factors that may influence interpretation of cognitive trajectories.Whilst all repeated computerised tasks in the NeurOn battery demonstrated moderate test-retest reliability, some cognitive tests parameters showed more reliability than others, with the Go/No-Go task demonstrating the lowest test-retest reliability in the battery.This aligns with previous studies showing that the Go/No-Go task performs with modest test-retest reliability compared to other impulsivity measurements in reliability measures [46] and changes in performance have been noted across testing sessions [47].Lower reliability in the Go/No-Go task relative to other tasks may be resultant of the nature of the task, with attentional control and motor disinhibition being inherently variable to impacts of mental fatigueleading to increased errors and longer response times [48].Older adults also typically exhibit more variability in motor control [49], and therefore the task may be more susceptible to these errors than tasks that do not require rapid response times.This is supported by our findings that the highest test-retest reliability scores were for egocentric orientation, which requires less immediate motor activity and may be less affected by short-term fluctuations in attentional control.Indeed, previous research has shown that the egocentric orientation has the highest test-retest reliability across spatial orientation tasks (ICC = 0.72, similar to our finding of ICC = 0.75) [50].Future cognitive battery studies should consider these factors when selecting and designing tasks.It may be beneficial to explore methods to enhance the reliability of tasks like the Go/No-Go.Strategies could include optimising task design, implementing more robust practice trials, and controlling for external factors that impact attentional and motor performance.Additionally, future research should focus on developing and validating new tasks that balance sensitivity to cognitive changes with high reliability, particularly in diverse and older adult populations.
In the present study, we found that egocentric orientation was the only cognitive test found to correlate with CCI score, which is commonly used to identify subjective cognitive decline (SCD) [51].Previous research has established that individuals with SCD typically show spatial orientation deficits [52,53], although little is known about how this relates to performance across other cognitive tasks.The present findings indicate that egocentric orientation deficits may be a key signature for SCD, supporting growing findings that spatial orientation performance as a marker for early cognitive impairment [25].SCD typically manifests prior to preclinical dementia [54], yet there is large heterogeneity in the outcomes of SCD, with many individuals experiencing SCD without objective cognitive impairments [55].As the only test associated with worse subjective cognition was egocentric orientation, future research may look to establish whether individuals with SCD who exhibit worse egocentric orientation abilities may be more at-risk for future cognitive impairment.However, as the finding was no longer significant after correction for multiple comparisons, further investigation is required to establish its association with early cognitive impairment.Overall, the novel cognitive battery demonstrates the usability and feasibility in measuring cognitive performance remotely, as all participants were able to complete the assessment unsupervised at home using a variety of devices.Our battery has also previously demonstrated feasibility and internal consistency in collecting large longitudinal normative cognitive data across regions (unpublished data).Many cognitive assessments, such as the MoCA, are limited in their generalisability across different cultures due to their reliance on language and cultural understanding [56].A strength in the NeurOn battery is that its tasks are visual and do not require language, allowing for greater cross-cultural generalisation in cognitive performance and advancing global dementia screening efforts [57].Although online cognitive testing has several advantages over in-person clinical assessments, diagnosing cognitive impairments, such as MCI, requires functional and clinical evaluations [58] and therefore should not take place outside of a clinical setting.Currently, online cognitive testing may provide a pre-screening tool for more extensive clinical assessments, such as neuroimaging and biomarker testing.Additionally, online cognitive testing can advance research by increasing the scale of epidemiological studies [59] and screening participants for eligibility in clinical trials [60].
Although our results are promising, this study has some limitations.First, we did not account for computer skill, which has previously been found to relate to better cognitive task performance [18,61].Secondly, the present study did not proactively target for a diverse population demographic in recruitment, which is important for the validation of cognitive testing.Lastly, our sample size of 32 was relatively small and therefore more research is necessary to comprehensively understand how sociodemographic factors influence neuropsychological tests within the NeurOn battery.Our sample consisted of healthy individuals, and therefore there is currently little understanding as to the feasibility of the NeurOn battery within patient population groups.Research is currently ongoing to examine how NeurOn test performance differs across healthy ageing, preclinical dementia, and early dementia.Finally, unsupervised cognitive testing has inherent drawbacks, such as a lack of standardisation in home testing.Consequently, it is possible that participant performance may be influenced by confounding factors i.e., distraction.However, participants were provided with clear instructions to mitigate these issues.
In conclusion, the NeurOn cognitive assessment battery demonstrates a promising instrument for assessing cognitive performance within healthy older adult populations.In the present study, the NeurOn battery compared well with MoCA performance; showed negligible practice effects; and was easily administered in unsupervised remote testing environments.Future research in online cognitive assessments should look to establish appropriate testing timepoints to sensitively measure longitudinal changes in cognitive functioning in wider sociodemographic samples.

Table 1 . Validation study participant demographic characteristics.
Welch two samples T-tests were conducted for group differences.b Chi-squared test was used to assess overall group differences in devices used.
a c Abbrev: MoCA = Montreal Cognitive Assessment, CCI = Cognitive Change Index.d Crame ´r's V was used for effect size of devices used.Cohen's d was used for effect sizes for other variables.https://doi.org/10.1371/journal.pone.0309006.t001

Table 3 . Test-retest reliability of cognitive tasks between online and in-person testing sessions.
Spearman's ρ was used for Go/No-go correlation as the online test score showed a non-normal distribution. b